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Abstract 

We introduce and study a new model for functional data. The ARHD is an au- 
toregressive model in which the first order derivative of the random curves appears 
explicitely. Convergent estimates are obtained through an original double penaliza- 
tion method. The prediction method is applied to a real set of data already studied 
in the literature. 
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1 Introduction 

Usually time series may be viewed as the discretized observations obtained from an under- 
lying stochastic process {C,{t),t G M). Models and statistical inference on such processes 
aim at providing the best possible predictor. Let us assume that the process ^ is observed 
on an interval [0, T]. We divide [0, T] into n subintervals [i6, {i + 1)5], i = 0, . . . ,n — l "with 
6 = T/n. This approach is clearly justified in the case when ^ is periodic with period 
6 but may be generalized to processes that are stationary or not. In the following we 
consider the functional- valued process X = {Xi,i G Z) defined by: 

Xi+i{t) = ^{i6 + t), 0<t<6, ieZ. 



For a review on statistical analysis of functional data we refer to Ramsey and Silverman 
(1997). In this paper we consider the prediction problem of the process ^ on an entire time- 
interval [T, T+6], or equivalently the prediction of knowing Xi, . . . , X„. To deal with 
the prediction problem Bosq (1991) introduced and studied an H-valued autoregressive 
process of order one, denoted ARH in the following, where {H, (-, ■)) is a suitable Hilbert 

* Corresponding author : Departement de Mathematiques, CC 051, Universite Montpellier 2, Place 
Eugene Bataillon, 34095 Montpellier Cedex 5. 
mas@math.umv-niontp2.fr 



1 



space of function with inner product (•, •) such that Xi e H (typically H — L^[0,d\ 
the space of square integrable functions on [0,5]). The model tends to generahze to 
functional data the classical and celebrated AR{1) model. Then the ARH process admits 
the presentation 

Xi = p{Xi_i) + ei 

where (ej; i G Z) is the if- valued innovation process and p is a bounded compact linear 
operator such that \\p\\^ < 1. Note that \\p\\^ stands for the classical operator norm for 
p and is defined as 

IHIoo= sup IIP(^)II- 
heH,\\h\\<l 

The ARH process is stationary and the best possible prediction of is Xn+i — pi^n) 
whenever E(e„|X„_i) =0. If jo„ is a consistent estimate for p, the prediction is made 
through pn{Xn). Bosq (2000) proposes such a predictor and proves the consistency under 
mild conditions. 

Considering some regularity conditions on the sample paths, one may obtain similar 
results for autoregressive spaces with values in other functional spaces (see Pumo (1992) 
for results on C[0, 1], or Mourid (1995) for results on general Banach spaces). Alternative 
approaches to solve the prediction problem based on ARH modelization are proposed by 
Besse et al. (1996, 2000)) by means of spline smoothing and more recently Antoniadis 
and Sapatinas (2003) who implemented Wavelet techniques. Furthermore numerical stud- 
ies show that the predictors obtained by this alternative methods are better then those 
obtained by linear interpolation ARH predictor (Pumo (1998)). 

Generalizing to the functional data the classical multivariate models is a new and 
fruitful trend in modern statistics : linear or non linear regression, high-dimensional 
or functional ANOVA, MA(oo) processes all have their "functional" counterparts. But 
usually and up to the authors' knowledge, these classical statistical models for functional 
data never involve the derivatives of the random curves rebuilt from discretized data. 
However the opportunity to compute explicit first (or higher) order derivatives is one of the 
main feature differencing truly functional from multivariate data. We refer to Silverman 
(1996) in the framework of principal component analysis, then Ferraty and Vieu (2003) for 
regression models. These authors underline the specific amount of information contained 
in the derivatives of curves rebuilt from functional data as well as their practical interests. 

The aim of this paper is to use the functional properties of smooth sample paths in or- 
der to improve the predictor. More precisely we suppose that the sample paths belongs to 
the Sobolev space W'^'^ (defined in the next section). We study an autoregressive model, 
initially introduced by Marion and Pumo (2004), whose definition explicitly involves the 
first order derivative of the data. This "ARHD" model is detailed below. Wong process, 
that is a stationary Gaussian process with continuously differentiable paths may be repre- 
sented this way. We propose to estimate the two unknown parameters of the model by an 
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original method. This technique is inspired by the ridge regression method and involves 
two overlapping penalization through two parameters depending on each other. 

The paper is organized as follows. In the following paragraph we introduce the 
ARHD(l) model, simply denoted ARHD in the sequel, and show that it is strictly station- 
ary. In section m we give conditions for the unknown operators and \E' to be identifiable 
and provide estimates as well as asymptotic results. Section El is devoted to technical 
details about the numerical calculation of ARHD predictors and to the comparison of the 
ARHD predictors with various functional methods in two cases: the first one is a simu- 
lated example (the Wong process which allows an ARHD presentation) ; the second is a 
real data study concerning the El Nino-Southern Oscillation (ENSO) time series. Proofs 
of asymptotic results are postponed to section [7| 



2 The model 

It was introduced by Pumo and Marion (2004). Let Xi be a sample of random curves. 
We introduced the model above : 

X,+i = 0(X,) + ^(XO+£m (1) 

where and \E' are linear operators. 

Now we suppose that for all i, Xi takes its values in the Sobolev space W'^''^ [0, 1] . 

W^'^ = {ueL^ [0,1], u' eL^ [0,1]}. 

The space W"^'^ is a separable Hilbert space endowed with scalar product : 

{u,v)yy= [ u it) V (t) dt+ I u' [t] v' it) dt. 
Jo Jo 

We refer to Ziemer (1989) or to Adams and Fournier (2003) for monographs dedicated 
to Sobolev spaces. In the sequel W^'^ will be denoted W and W"^'^ = L"^ will be denoted 
L for the sake of simplicity. Obviously if we set Du = u' then D maps W onto L {D 
is the ordinary differential operator). Furthermore Sobolev's imbedding theorem ensures 
that (see Adams and Fournier (2003) Theorem 4.12 p. 85) 

\\Du\\^ < C\\u\\^ 

(where C is some constant which does not depend on u) i.e. D is a bounded operator 
from W to L. 

From now on we assume that is a compact operator from to ly and is a compact 
operator from L to W. For a review on compact operators we refer to Dunford-Schwartz 
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(1988) or Gohberg, Goldberg, Kaashoek (1991). 



3 ARW representation of the ARHD process. 

Prom the above paragraph we know that </) + \l/Z) is a well defined operator on W and onto 
W, that "^D is a compact operator as the product of a bounded and a compact operator 
and consequently that + is itself compact as the sum of two compact operators. 
We can rewrite : 

X,+i = A(X,)+£,+i (2) 

where A = (j) + "^D. 

Finally the ARHD process may be rewritten as a special ARH(l) process with values 
in W. 

The trouble with Q is the following : the parameters and \1/ are hidden behind A 
and we are not willing to infer on the latter. Obviously we are going to face two issues : 

• Studying the identifiability of and \I/ in the model above. 

• Providing a consistent estimation procedure for and \1/ before forecasting. 

Prom now on we suppose that 

H1:||A|L<1, 

H2 : \\X\\y^, < +00 a.s. 

The first assumption is crucial for the stationarity of the process. The second is quite 
restrictive but could be alleviated to mild moment assumptions but it will make the proofs 
of the main result more easily readable. This assumption appears for instance in Cardot, 
Perraty, Sarda (1999) for the same reasons. We assess the first property of the process, 
which will be useful in the sequel. 

Proposition 3.1 When assumptions HI and H2 hold, (Xi)-^^ and (X-)^^^ are strictly 
stationary sequences on W and L respectively. 

The stationarity of (Xi)-^^ is a simple consequence of the representation of equation (j2)) 
and of previous results obtained for instance by Bosq (2000), Chapter 3. The continuity 
of D on W implies the stationarity of the sequence {Xl)^^^ . 
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4 jjjstimation proceaure 



4.1 The moment method 

Prom a practical point of view the Sobolev setting is not really a constraint. It is well 
known that either splines or wavelets will provide standard reconstruction method (from 
the discretized data) yielding functions in W'^'^. 

The model is purely functional : we cannot invoke any likelihood-based technique since 
the "density of a random curve" makes non sense (Lebesgue's measure does not exist on 
infinite dimensional spaces) . We propose to start from a classical moment method and to 
adapt it to our setting. 

By Ch (resp. Chh') we denote the space of compact operators on the Hilbert space H 
(resp. mapping the Hilbert space H onto H') . Some finite rank operators are defined by 
means of the tensor product : if u and v belong to H and H' respectively u <S>h v is the 
operator defined on if by : for all /i e 

{u ®H v) (h) — {u, h)jj V. 

We start from a sample (Xj, and we denote 

V* = E {X'^ ®L ^o) ^E {X'o 0L ^o) , 
A^E{Xo®wXi), A'^E{X'q®lX,). 

Under assumption H2 all these operators belong either to Cw, Cwl, Clw or Cl- In fact 

1 1 2 

assumption H2 could be replaced by E \\X\\y^^ < +oo. 

By r„, r^, A'„ we denote the empirical counterpart of these operators based on the 
sample {Xi, Xl)^^^^^. For example : 

1 " 

Tn — — y Xk <^w Xk, (3) 

k=l 

n— 1 
k=l 

Remark 4.1 The notation V* is not ambiguous : V* is truly the adjoint operator ofV. 

Remctrk 4.2 Conversely, if the random function X' is truly the derivative of X, this is 
no more the case as far as linear operators are concerned : F' is not the derivative of F. 
The ' is just a notation in this setting ; this would make no sense anyway. However, it 
should he remarked that for all u in L 



{V* {u))'={E{{u,X'),X))' = T" 



u) 
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and for all vinW 



{T{v)r = {E{{v,X)^X)r = T'{v). 



Quite naturally, from (jT} -multiply with {Xi, ■) and (X-, ■) successively then take 
expectation- we easily deduce both moment equations : 



Resolving this system is apparently easy but we should be aware of two facts : 

• Operators (here, A,0, F...) do not commute ! 

• The inverse operators of F, and F" do not necessarily exist and when they do, they 
are unbounded, i.e. not continuous (remind that F, and F" are compact operators 
and that compact operators have no bounded inverses). 

At this point, before trying to solve (jlj) we need to study identifiability of the unknown 
infinite dimensional parameter (0, G Cw x Clw in our statistical problem. 



We set S = Cw x Clw- If both equations in (jlj) are the starting point we should make 
sure that solutions to these equations are well and uniquely defined. Suppose for instance 
that KerF ^ {0} and take h in it. Now set = + /i ^w h. Then 



but {h ^w h)r = 0. So 0F = 0F and is not unique (there are even infinitely many 
solutions in the space 0+KerF). The next assumption is 




(4) 



4.2 Identifiability 



0F = 0F + {h ^w h) V 



H3 : KerF = KerF" = {0} . 



In other words we suppose that both operators above are one to one. 
Now turning back to we rewrite the system. Equivalently : 




(5) 




We are now ready to solve the identification problem. 
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Proposition 4.1 The couple (0, "^j E S is identifiable for the moment method proposed 
in ^ if and only if {(f), ^ A/" where M is the vector suhspace of £ defined by 



M = {{U,V) e £ : U + VD = 0} . 
Note that M is a closed set in £. 

In other words if, for all u in W, (f)u + ^u' = 0, the parameter (0, \[^) cannot be 
identified. 

The Proposition is proved at the beginning of the last section of the paper. 

5 Definition of the estimates and convergence 

Since the unknown parameters (f) and \E' are operators estimating them means dealing 
with random operators based on the double sample {Xi, X^)^^.^^. We refer for instance 
to (jni) above for examples of these available operators. 

The estimates stem from (j3)) which is a highly non invertible system. We are classically 
going to add a small perturbation to regularize it and make it invertible. We solve : 



S' 



A = (r + aiw) + 

A' = (j)T'* + ^ {T" + ah) 

where a is a positive real number and Iw denotes the identity operator on W. Now the 
operators (T + alw) and (F" + ah) are no more compact but have bounded inverses. 
Basic algebra gives : 

^, ^ I [(r + aiw) - r* (r" + ah)-' r'] = a - A' (r + ah)'' r 

[ ^ [{T" + ah) - r (r + aiw)-' T'*] = A' - A (F + aiw)'' T'*. 
Which is then once more approximated by : 

S" = 



(f)[T- V* {T" + ah)-' r] = A - A' {T" + ah)'' T' 
^ [T" - r (r + aiw)-' T'*] = A' - A (r + aiw)-' T'*. 



We just dropped aiw on the first line and ah on the second to get S". Take the first 
line in the above display. The operator 



S^ = T-T'* (T" + ah)-'r' 

is a selfadjoint compact operator. (Indeed T'* (F" + ah)-'r' is a compact operator be- 
cause r' and r'* are). We may deduce from this fact that has real eigenvalues (not 
necessarily positive) and furthermore that once again S,p has no bounded inverse. The 
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same remarks hold for 

5-^ = r"-r'(r + a/v^.)-'r'*. (9) 

However we can provide an approximate solution to S" by regularizing and S^, once 
more by a penalization method. Finally the pseudo solutions we propose to solve S" 
hence S are based on a second strictly positive parameter [3 and are denoted (j) and ^ : 

= [A - A' (r + ah)-' r] {s, + m'' 

m=[/\'-/\{T + aIwVT'*]{S^ + (3I)-\ 

This new system defines relations from which we propose to deduce estimates. From 
now on -in order to alleviate the notations- by 5*^ we denote the operator defined by 
{S + q;„/l)~^ where a„ is a non increasing sequence of positive numbers decaying to zero. 
We set : 

(11) 

^Urt)r:, (12) 
X (r:0 r'„, (13) 
^n(rt)r':. (14) 

Taking /3„ | we obtain the following 

Definition 5.1 The estimate of the couple {(j)-,'^) is ((/)„, \E'„) based on ilU^) and defined 
by : 

The next Theorem is the main theoretical result of this article. It provides the con- 
vergence of our estimates when the sample size goes to infinity. 

Theorem 5.1 When HI — 3 hold and if an — ^ 0, /5„ with ^Jna\|3'^ +oo and 
Ja;,l(5n 0, 



Sn,4> 


= r„ 




= r" 

n 


T 


= A„ 




= A' 

n 



The convergence is understood in the \\-\\^ norm for bounded operators. 
Note that Theorem 15.11 holds whenever a„ = ?t,~" and /5„ = with h < a/2 and 
26 + 2a < 1/2. 

Remark 5.1 Originally the linear model (OJ) is subject to serious multicoUinearity trou- 
bles since X'^ = DXn- Even if the curve X'^ usually looks quite different from there 
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is a total stochastic dependence between them. The method used in this article to tackle 
this problem (as well as the intrinsic "inverse problem" aspects related to the inversion 
of the covariance operators T and T" ) is new up to the authors' knowledge. As it can be 
seen through above at display m(J\} or in the proofs below, it relies on a double penalization 
technique first by the index an then by (3n linking both indexes in order to asymptotically 
suppress the bias terms. 



6 A numerical study and application: ENSO 



In this section we illustrate the ARHD method of prediction proposed in this paper by 
some numerical studies for two examples. We give first some technical results to carry 
out numerical calculations. The first application is connected to Wong's process (see 
Wong (1966)) which admits an ARHD presentation. We compare the ARHD predictor 
with various predictors based on the notion of ARH process that is, linear interpolation 
ARH predictor (Pumo (1998)), Fourier interpolation ARF predictor and AKW predictor 
based on the presentation Q, by two statistical criteria: mean-squared error (MSE) and 
relative mean- absolute error (RMAE) defined by : 

MSE ^ i ±iX„(t,) - .t„(,))^ RMAE ^ 1 ± ^^^M=|fM 

where m is the number of discretized points. 

The second example concerns real data, namely climatological time series describing 
the El Nino-Southern Oscillation (ENSO). We compare our predictor with predictor based 
on similar approaches found in the literature : spline smoothing FAR predictor (Besse and 
Cardot (1996)), Local FAR predictor (Besse et al. (2000)) and wavelet based predictor 
(Antoniadis and Sapatinas (2003)). 



6.1 Some technical details about simulations 

Consider the Fourier basis on L^[0,5] and denote eo(t) = 1/5 and e2j_i(t) = cos(2j7rt/5), 
e2j{t) = sin(2j7rt/(5) for j > 1. Then a simple calculation shows that 

w = {eo, [1 + 4jV/6r'^' . 62,-1, [1 + ^fn'/Sr'^' ■ e,„j > 1} 

is an orthonormal basis for W. Let / = J2i=o oo^j^j^ where Cj = {f,ej)i2, be the Fourier 
series of a continuously differentiable function /. Then /' = J2i=o oo^j^'j- Furthermore the 
decomposition Ej=o,oo {f^^j)w'^j on of / is given by Cq + Ej=i,oo {f^'^j)w^J where 
{f,w,,.,)^ = [l + 4jVV52]-V2.(j,e2,_i)^. and {f,w,,)^ = [l + Afny6r'^'-{f,e,,)^,. 
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In order to calculate the covariance operators given in section ^ denote w^v (resp. 
ejv) A^-vectors of the basis on W (resp. L^) that is w^r = *{wo,Wi . . . w^^i) (resp. 
gn = *(eo, ei, . . . , cat-i)) and suppose that is an odd and positive number. Denote X 
and X' the N x n matrices : 

X= {{Xi, Wk^i)^ , k = 1,. . .,N; i = 1,. . . ,n, 
X'= ((X;,efc_i)^2 , k = l,...,N;i = l,...,n. 

As noted above the coefficients {Xi,Wk-i)^r and {Xl,ek-i) ^2 are obtained directly from 
the Fourier decomposition of Xj, for z = 1, . . . , ra. 

It follows that covariance operators r„, r'„, T'*, F^, A„, A'„ can be approximated by : 

a = (l/n) ■ X (*X), = (l/n) • X' (*X), 
C = (l/^)-X(*X'),< = (l/n)-X'(*X'), 
= il/[n - 1]) • X_i (*X_„), D: = il/[n - 1]) • X_i (*X'_J 

where X'_„ (resp. X_i) is the matrix X' (resp. X) without the column n (resp. 1). 
So in order to obtain the estimators given in the second section it suffices to substitute 
the covariance operators in ()1HI14|1 and (fTH|l by their approximations given above and 
choosing suitable values for a„ and 

6.2 Wong process 

This process is defined for n G i? by: 

/■exp 

= exp (—y/Su) / Wgds. 

Wong process is a mean-square differentiable stationary Gaussian process which is zero- 
mean and with variance 1. Let 6 > and X^ & W given by Xj+i(t) = C,i.s+t for t g]0, 6]. 
Let ej+i be a squared differentiable r.v. with values in iy^'^[0,5] : 

^exp(2(j+t)/V3) 

e.+i(t) = v^exp -v^(z + t) / (W^s - W^exp(2./v^)) (16) 

^ ^ Jcxp(2i/v^) 

Then the process {Xi, i E Z) can be written as: 

X,+i = [0 + ^(D)]X, + e,+i (17) 
where c{t) = ^ ■ exp{-V^t) ■ {exp{2t/V^) - 1} and: 

[0(/)](t) = [exp(-v^t) + v^c(t)]/(i), [^mmt) = c{t)ni). 
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Furthermore e^+i is independent of Xi,X- and a direct calculation shows that assumption 
HI is satisfied. 

Using a method presented in Blanke and Pumo (2003) we simulated a Wong process 
^{t) on [0, 192.65], that corresponds to n = 105 intervals of length 6 = 1.8348, each known 
at m = 50 equidistant points tj,j = 1, . . . , 50. A simulated process is presented in Figure 
m The associated process (Xj) is a W^^'"'^[0, 1.8348] valued process. 

Please insert here Figure 1 

Fifty Wong processes were simulated and for each of them we calculated the MSE and 
RMAE criteria. The mean values for the two criteria for the 50 simulations and various 
predictors are presented in Table [T] Figure |2l presents the different predictions for one of 
the simulations. 

For the calculation of ARHD predictors we consider two values for the parameter a„, 
that are 0.1 and 0.3. The corresponding values for /3„ are 0.65 and 0.5. In the calculation 
of ARH, ARF and ARW predictor we consider fc„ = 1, that is the projection subspace 
for observation is equal to one (see Pumo (1998) for details). Simulations show that the 
ARW and ARF predictor are very similar and when m (this is the case for example when 
m = 50) is large they give similar results to linear interpolation ARH predictor. But 
the three predictors are less better than the ARHD predictor. Notice also that the choice 
of the optimal values for the parameters and Pn may be done by a cross-validation 
procedure. 

Please insert here Table 1 
Please insert here Figure 2 



6.3 Example SST: Sea Surface Temperature 

The second example concerns a climatological time series describing the El Nifio- Southern 
Oscillation (see. for example Besse et al. (2000) or Smith et al. (1996) for a description 
of the data^). The series gives the monthly mean El Nino sea surface temperature index 
from January 1950 to December 1969, that is m = 12, and is presented in figure El We 
compare the ARHD predictor with various functional prediction methods. 

Please insert here Figure 3 

In the first numerical study we compare the prediction of the temperature during 1986 
knowing the data until 1985. We calculated the ARHD predictor with q;„ = 0.4 and 0.1 
and f3n = 0.8 and 0.4. The MSE and RMAE criteria for various functional predictors 
are given in Table |21 Results show that the best method are Wavelet II (one of the 



^Data is freely avalilable from http://www.cpc.ncep.noaa.gov/data/indices/iiidex.htinll 
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wavelet approaches proposed in Antoniadis and Sapatinas) and spline smoothing FAR. 
Nevertheless our predictor is better than the other predictors or the classical SARIMA 
(0, 1, 1) X (1, 0, l)i2 model (see for example Brockwell and Davis (1987)). Figure 01 displays 
the observed data during 1986 and its predictors by some of the predictors discussed above. 
Notice that the ARF or ARW predictors are not satisfactory as m = 12. 

In the second numerical study we make 10 one year ahead forecasts for the period 
1986-97. The statistical criteria for various functional methods are presented in Table El 
The reader may notice that the ARHD method gives a similar prediction as the Local FAR 
method which is the best functional prediction method appearing in Besse et al. (2000). 
Note finally that as described in the introduction of this section the computational effort 
to obtain an ARH predictor is comparable to that of the calculation of an ARH predictor. 



The couple (0, \I') will be identified whenever, for any other couple ycj), \l/ j , if yep, \I' j A = 
(0,^)A, (0,^) = (0,^). This will be true if 



Please insert here Table 2 



Please insert here Figure 4 
Please insert here Table 3 



7 Proofs 



Proof of Proposition 14. H 





mV)eS:{U,V)A = 0} = {0}. 



But A may be decomposed as the product of three operators, namely : 




As r is one to one by assumption H3, and since 



{T,TD*) = iff T = 0, 



it is readily seen that 



{U,V)A = OiSU + VD = 



which finishes the proof of the Proposition. 

We begin with five Lemmas needed to prove Theorem 15.11 
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Lemma 7.1 



irn - riL = Op ( ^] , ||r'„ - r'|L = oJj^], 



Proof : Since X„ is an ARH(l) process (with autocorrelation operator A), we can 
invoke for instance Theorem 4.1 p. 98 in Bosq (2000) to get the first rate of decay. All 
the other results above are due to the boundedness (in our framework) of the differential 
operator D. Indeed for instance = DTnD*. 



Lemma 7.2 

||r"'''| 

||r"t _ r"t| 

II n I 

Proof : We prove the first bound 



|p"t| 



a 



T" + al) 



-1 



and as T" is a positive compact operator, the norm of operator (F" + al)^^ , which is 
known as the resolvent operator of F", is non random and evaluated at a'^. The same is 
true with F'^ replacing F". 

Using B^^ — A^^ = A^^{A — B)B^^ for two invertible matrices A and B, we get : 

r"^ - = + aiy' (F: - F") (F" + al)-' 

which entails 



|F"t-F"t|| < 



(r: + «/) 



-1 



If" f"ii 

1-*^ n ~ lloo 



(F" + «/)-' 



~^ II ~ — Op i^r- 



by Lemma f7. 11 

Lemma 7.3 Let Sn,(j, and S^j) defined respectively by / f77|) and (0). Then 

1 



oP-\in 



Proof : From equations © and © we obtain Sn,^, - 5^ = F„ - F + F'* (F"t) F' 

r;* (r'^t) r; and 
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11^.,^ - ^^IL < urn - r|L + ||r'* (r"t) r' - (r'^t) r;||^ . 

We look for a bound for 

||r- (rt) r - (r'^t) r;||^ < ||r* (r"t) r - (r"t) r'||^ 

+ lie (r"t) r - v: (r"t) rj^ + \K (r"t) r; - (r^t) r'„ 

Obviously the two first terms above may be bounded in probability by 

nLl|r'JLIir'-rj^ = o/ ^ 



a\/n 



since ||r^||^ = ||r^||oo . The remaining term may be bounded by 

lir:jn|r"^-r'^1L = 0p(||r"t-r:t||j 

and Lemma [7.21 finishes the proof. 



Lemma 7.4 The operator 5*0 is positive hence 

\\(s,+pirL<^. 

Proof : Before starting the proof it is worth reminding the following fact to the 
reader. It T is a compact operator from a Hilbert space Hi to a Hilbert space H2 it 
admits a Schmidt representation 

+00 

T = ^ Si (wi (g) Vi) 

i=l 

where the Sj's are the positive eigenvalues of T*T (i.e. of TT*) and where Ui (resp. Vi) 
denote a complete orthonormal system of Hi (resp. H2). We refer for instance to Theorem 
1.1 page 96 in Gohberg, Goldberg, Kaashoek (1991). Now we turn to 

s^ = r- r'*r"^r' 

We set U = DL^/^ (remind that T' = DT). The operator U is a compact from W to L 
since D is bounded and L^/^ jg compact like T. Then T" = UU* and r"t = {UU* + al)'^ 
and we rewrite 

= (J _ If* (jjjj* ^ly^ f/) ri/2 (18) 
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Now let us write the Schmidt decomposition of U 

+00 



f/ = ^ Sj {ui ®w Vi) Ui e W, Vi e L 
Easy computations lead to 



i=l 



+00 



■ 1 01 + sf 
1=1 ' 



From (fTH|) we deduce that for all x in (St^x, x) > hence the announced result. 
Lemma 7.5 

II + m-' - (s, + m-'L = [-1^ ■ 

Proof : 

{Sn,4> + 131)'^ - {S4, + (31)-^ = [S^ + 131)-^ {S4, - Sn,^) {Sn,4> + /^^)"' (19) 

hence 

(/ - {s^ + piy' {s^ - Sn,^)) {Sn,^ + piy' = {s^ + piy' . (20) 

Since 

the probability that / — (5*0 + I3iy^ {S^ — Sn,(j,) is an invertible operator tends to 1. It 
suffices indeed that 

\\{s^ + piy'{s^~Sn,^)\\^<i 

to write from pn|) : 

{s^,^ + piy' = {i- {s^ + piy' {s^ - s^^^)y^ {s^ + i5iy^ . (21) 

We set En = {S^ + (3iy^ {S^ - Sn,cp) then 

+00 

{I - Hn)-' = I + Y.Hn- 

At last from (pT|l and (fT!I|l : 



+00 
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and 

+00 

Sn,^ + (31)-' - (s^ + piy'W^ < \\{s^ + piy'W^ Yl 

p=i 

o ■ ' 



by Lemmas 17.31 and 17.41 

Proof of Theorem 15.11 : We prove the Proposition for 0„ since the same technique 
would lead to an analogous result for \E'. Developing the expression of A„ and yields 

An = (/.r„ + ^r;, + f/„, 
a; = + + f/;, 



with 



Hence 



^ n 1 " 

n ^-^ n ^-^ 

k=l k=l 



T.,<^ = A. - a; (r^t) 

= 0r„ + M/r; - ^r;* (r::t) r; _ M/r;: (r::t) r; + f/„ - 1/; (r::t) r; 
= + v]/ [r'„ - (r^t) r'„] + [f/„ - f/; (r^t) r'J 



At last 



0„ - = /50 + /?/)-^ + VP [r'„ - (r'^t) r'j + (22) 

The proof will be achieved if we prove that the three terms in the display above tend 
to zero in probability. The three next Propositions namely Propositions 17.11 17.21 and 17.31 
are devoted to this goal. We begin with the last one involving f/„ and U'^. We need two 
auxiliary Lemmas. 



Lemma 7.6 



=0p( ^ ) , (23) 



|t/;||=Op(^), (24) 



(25) 



Proof : The proof of ()23|) and ()24|) is obvious since [/„ and U'^ are sums of uncorrelated 
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random operators (here uncorrelated means that the cross covariance operator between 
two distinct random elements is the null operator). Then 



\\u' (T"^)T' \\ <\\u'\\ \\T"m iir II 

\\ n \ n I n||oo — ll nlloo || n ||oo ll n. lico 

where the last term on the right side is bounded in probability, the first is an Op (n~^/^) 
and the norm of the second is almost surely bounded by and ()25|1 is proved. 

Proposition 7.1 

[f/„ - f/;, (r^t) r'J (5.,^ + /3/)-^ = o, 



Proof : The proof of the Proposition is a consequence of Lemmas 17.41 and 17.61 
We turn to the first term in (l22ll. 



Proposition 7.2 If P ^ and „ _ 0, 



Proof : We invoke Lemma 17.51 to claim that it suffices to drop the index n in the 
Proposition and to prove that : 

In fact Lemma [7.51 links the asymptotic behavior of {Sn,^ + PI)^^ and (5*0 + f3I)~^. Re- 
member that is a compact operator from W to W. This fact is crucial. It implies that 
we just have to prove that 

By Lemma f7. 41 this fact is straightforward. Indeed it was then proved that S*^ is a selfad- 
joint positive operator hence admits the spectral decomposition 

+00 

1=1 

where the /ij's are the positive eigenvalues of 5*,^ arranged in a decreasing order and the 
tiS are the associated eigenvectors. Then if x = Yli^i^i where Xi = {x,ti)^^ we easily get 

\\Hs,+m-'4l = j:T^ 

For fixed i, — ^ as /3 , sup,- | — ^ 1 < 1 and as (x, tj)?,/ < +oo, 
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applying Lebesgue s dominated convergence Theorem yields p {S^ + pi) x ^ in W. 

Let us deal with the second term in (j221)- 
Proposition 7.3 if (3 and \fal(5 decays to zero, 

11^ [r; - (r^t) r;] + piy'w^ ^ o. 

Proof : Once more it suffices to prove that 

II \^'n - ij^n) T'n] ||^ P 



smce 



/5 



[r;-r:: (r::t) (5„,,+/3/r^||^< 



[r; - (r::t) r;^] 

/3 



We keep on replacing the random operators based on the sample by their limits. 



(26) 



[r'„-r:: rar;]|L = ||a(r'^t)r;| 

<ii"raiLiir;-r'L 



(r"t) r'l 



By Lemma f7. II the ffist term is an Of ( —= \ , the second is an Op f 

'n I \ Oi\/n 



The last term is totally deterministic and we are going to prove that it is an o (v^)- 
Once again we introduce the compact operator U = DT^^"^. We see that 



r"tr' = a (r"t)'/' {uu* + aiy"^ f/r^/^ 



— 1 /2 

since T" = UU* . First we show that {UU* + al^^ U is a class of operators uniformely 
bonded with respect to a. In fact introducing the Schmidt representation for U from 
Lemma (7.41 we get 



{UU* + aiy'^^u = Y, 



1^1 VW+ 



-Ui Ui 



a 



and 



sup 

a>0 



{UU* + aiy^'^u 



< sup > 

oo a,i \ sj + a 



< 1 



At last noting that 
Proposition 17.31 is finished. 



o{^/a) and taking into account ()26p . the proof of 
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Figure 1: Wong process simulated on [0, 192.65] 





ARH approach 


ARHD a 


Dproach 




ARH 


ARF 


ARW 


a = 0.3, /3 = 0.65 


q; = 0.1,/3 = 0.5 


MSE 
RMAE 


0.624 
1.580 


0.622 
1.599 


0.623 
1.599 


0.327 
1.223 


0.323 
1.125 



Table 1: Mean of MSE and RMAE error for the 50 simulations. 



Error of predictions 86 


Predictor 


MSE 


RMAE 


Wavelet II 


0.063 


0.89% 


FAR 


0.065 


0,89% 


ARHD a = 0.1, /3 = 0.4 


0.167 


1.25% 


Wavelet 111 


0.191 


1.20% 


ARHD a = 0.4, /3 = 0.8 


0.219 


1.33% 


ARH(l) kn = 1 


0.278 


1.60% 


SARIMA 


1.457 


3.72% 



Table 2: MSE and RMAE for the prediction of El Nino surface temperatures during 1986 
for various methods. 
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Figure 2: Prediction of the 105-th sample path by ARH, ARF, ARW and ARHD 
method. 



Figure 3: The monthly mean Niho-3 sea surface temperature index from 1950 until 1996. 
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Figure 4: Evolution of Nino-3 surface temperature during 1986 and its various predictions. 



Mean Error of predictions 8 


7-96 


Predictor 


MSE 


RMAE 


ARHD a = OA, (3 = 0.8 


0.53 


2.1% 


Local FAR 


0.53 


2.2% 


ARHD a = 0.1, /3 = 0.4 


0.53 


2.2% 


FAR 


0.55 


2.3% 


ARH(l) kn = 1 


0.68 


2.4% 


SARIMA 


1.45 


3.7% 



Table 3: Mean value of MSE and RMAE errors for prediction of SST from 1987 to 1996. 
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